---
title: Local LLMs
---

LlamaIndex.TS supports OpenAI and [other remote LLM APIs](/docs/llamaindex/modules/models/llms). You can also run a local LLM on your machine!

## Using a local model via Ollama

The easiest way to run a local LLM is via the great work of our friends at [Ollama](https://ollama.com/), who provide a simple to use client that will download, install and run a [growing range of models](https://ollama.com/library) for you.

### Install Ollama

They provide a one-click installer for Mac, Linux and Windows on their [home page](https://ollama.com/).

### Pick and run a model

Since we're going to be doing agentic work, we'll need a very capable model, but the largest models are hard to run on a laptop. We think `mixtral 8x7b` is a good balance between power and resources, but `llama3` is another great option. You can run Mixtral by running

```bash
ollama run mixtral:8x7b
```

The first time you run it will also automatically download and install the model for you.

### Switch the LLM in your code

To switch the LLM in your code, you first need to make sure to install the package for the Ollama model provider:

```package-install
npm i @llamaindex/ollama
```

Then, to tell LlamaIndex to use a local LLM, use the `Settings` object:

```javascript
import { Settings } from "llamaindex";
import { ollama } from "@llamaindex/ollama";

Settings.llm = ollama({
  model: "mixtral:8x7b",
});
```

### Use local embeddings

If you're doing retrieval-augmented generation, LlamaIndex.TS will also call out to OpenAI to index and embed your data. To be entirely local, you can use a local embedding model from Huggingface like this:

First install the Huggingface model provider package: 

```package-install
npm i @llamaindex/huggingface
```

And then set the embedding model in your code:

```javascript
Settings.embedModel = new HuggingFaceEmbedding({
  modelType: "BAAI/bge-small-en-v1.5",
  quantized: false,
});
```

The first time this runs it will download the embedding model to run it.

### Try it out

With a local LLM and local embeddings in place, you can perform RAG as usual and everything will happen on your machine without calling an API:

```typescript
async function main() {
  // Load essay from abramov.txt in Node
  const path = "node_modules/llamaindex/examples/abramov.txt";

  const essay = await fs.readFile(path, "utf-8");

  // Create Document object with essay
  const document = new Document({ text: essay, id_: path });

  // Split text and create embeddings. Store them in a VectorStoreIndex
  const index = await VectorStoreIndex.fromDocuments([document]);

  // Query the index
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({
    query: "What did the author do in college?",
  });

  // Output response
  console.log(response.toString());
}

main().catch(console.error);
```

You can see the [full example file](https://github.com/run-llama/LlamaIndexTS/blob/main/examples/vectorIndexLocal.ts).
